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Abstract 

In the classical compress-and-forward relay scheme developed by (Cover and El Gamal, 1979), the decoding 
process operates in a successive way: the destination first decodes the compression of the relay's observation, and 
then decodes the original message of the source. Recently, several modified compress-and-forward relay schemes 
were proposed, where, the destination jointly decodes the compression and the message, instead of successively. 
Such a modification on the decoding process was motivated by realizing that it is generally easier to decode 
the compression jointly with the original message, and more importantly, the original message can be decoded 
even without completely decoding the compression. Thus, joint decoding provides more freedom in choosing the 
compression at the relay. 

However, the question remains whether this freedom of selecting the compression necessarily improves the 
achievable rate of the original message. It has been shown in (El Gamal and Kim, 2010) that the answer is 
negative in the single-relay case. In this paper, it is further demonstrated that in the case of multiple relays, there 
is no improvement on the achievable rate by joint decoding either. More interestingly, it is discovered that any 
compressions not supporting successive decoding will actually lead to strictly lower achievable rates for the original 
message. Therefore, to maximize the achievable rate for the original message, the compressions should always be 
chosen to support successive decoding. Furthermore, it is shown that any compressions not completely decodable 
even with joint decoding will not provide any contribution to the decoding of the original message. 

The above phenomenon is also shown to exist under the repetitive encoding framework recently proposed 
by (Lim, Kim, El Gamal, and Chung, 2010), which improved the achievable rate in the case of multiple relays. 
Here, another interesting discovery is that the improvement is not a result of repetitive encoding, but the benefit of 
delayed decoding after all the blocks have been finished. The same rate is shown to be achievable with the simpler 
classical encoding process of (Cover and El Gamal, 1979) with a block-by-block backward decoding process. 

I, Introduction 

The relay channel, originally proposed in JT), models a communication scenario where there is a relay 
node that can help the information transmission between the source and the destination. Two fundamentally 
different relay strategies have been developed in for such channels, which, depending on whether the 
relay decodes the information or not, are generally known as decode-and-forward and compress-and- 
forward respectively. The compress-and-forward relay strategy is used when the relay cannot decode 
the message sent by the source, but still can help by compressing and forwarding its observation to the 
destination. Specifically, consider the relay channel depicted in Fig. [T] The relay compresses its observation 
Y\ into Y\, and then forwards Y\ to the destination via X\. To reduce the rate loss caused by the delay, 
block Markov coding was used in 0, and more blocks leads to less loss. 

In this paper, based on the differences in the detailed encoding/decoding processes, the following five 
different compress-and-forward relay schemes will be considered. 

• Cumulative encoding/block-by-block forward decoding/compression-message successive decoding; 

• Cumulative encoding/block-by-block forward decoding/compression-message joint decoding; 

• Repetitive encoding/all blocks united decoding/compression-message joint decoding; 

• Cumulative encoding/block-by-block backward decoding/compression-message successive decoding; 

• Cumulative encoding/block-by-block backward decoding/compression-message joint decoding. 

The Cumulative encoding/block-by-block forward decoding/compression-message successive decoding 
refers to the original compress-and-forward scheme developed in [2J. The encoding is "cumulative" in the 
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Fig. 1. The single relay channel. 



sense that in each new block, a new piece of information is encoded at the source. This distinguishes from 
a "repetitive" encoding process recently proposed in Q, where the same information is encoded in each 
block. The decoding is named "block-by-block forward" to distinguish from the other two choices, where 
the decoding starts only after all the blocks have been finished, either by decoding with all the blocks 
together, or by decoding block-by-block backwardly. The decoding is also called "compression-message 
successive" in the sense that the destination first decodes the compression of the relay's observation, and 
then decodes the original message. The compression Y\ can be first recovered at the destination, as long 
as the following constraint is satisfied: 

I{X i; Y) > I{Y 1 ;Y l \X 1 ,Y). (1) 

Then, based on Yi and Y, the destination can decode the original message X if the rate of the original 
message satisfies 

R<I{X-Y U Y\X X ). (2) 

The above two-step compression-message successive decoding process requires Yi to be decoded first. 
This facilitates the decoding of X, but is not a requirement of the original problem. Recognizing this, 
a joint compression-message decoding process was proposed in [4J, where, instead of successively, the 
destination decodes Yi and X together. It turns out that the decoding of X can be helped even if Yi 
cannot be decoded first. In fact, with joint decoding, the constraint ([T]) is not necessary, and instead of 
([2]), the achievable rate is expressed as 

R < I(X; Y u Y\X X ) - max{0, 7(Y i; Y X \X X , Y) - J(X i; Y)}. (3) 

Moreover, although Yi is not even required to be decoded eventually, it can be more easily decoded by 
joint decoding, and instead of ([T]), we need a less strict constraint: 

I(X 1 ;Y)>I(Y 1 ;Y l \X l ,Y,X), (4) 

where, it is clear to see the assistance provided by X. 

Similar formulas as ^ have been derived with different arguments in [5J-[7][[] 

Therefore, compared to successive decoding, joint compression-message decoding provides more free- 
dom in choosing the compression Yi. However, the question remains whether joint decoding achieves 
strictly higher rates for the original message than successive decoding. For the single relay case, it has 
been proved in flVj that the answer is negative, and any rate achievable by either of them can always 
be achieved by the other. In this paper, we are going to further consider the case of multiple relays as 
depicted in Fig. [2[ and demonstrate that joint decoding won't be able to achieve any higher rates either. 
More interestingly, we will show that any compressions not supporting successive decoding will actually 
result in strictly lower achievable rates for the original message. Therefore, to optimize the achievable 
rate, the compressions should always be chosen so that successive decoding can be carried out. 

Recently, a different encoding process was proposed in J3), where instead of piece by piece, all the 
information is encoded in each block, and different blocks use independent codebooks to transmit the 

'The formula and proof in missed a Y, and were later corrected in 0. 
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Fig. 2. The multiple-relay channel. 



same information. Compared to cumulative encoding, this repetitive encoding process appears to introduce 
collaboration among all the blocks, so that all the blocks can unitedly contribute to the decoding of the 
same message. This repetitive encoding/all blocks united decoding process was combined with joint 
compression-message decoding in [fJl, and although no improvement was shown in the single relay case, 
some interesting improvement on the achievable rate was obtained in the case of multiple relays. In 
this paper, we will show that actually it is not necessary to use repetitive encoding to introduce such 
collaboration among the blocks. The same rate can be achieved with cumulative encoding as long as 
the decoding starts after all the blocks have been finished. We will show that either by all blocks united 
decoding, or by block-by-block backward decoding, the same achievable rate can be obtained. Therefore, 
in terms of complexity, cumulative encoding/block-by-block backward decoding provides the simplest 
way to achieve the highest rate in the case of multiple relays. 

Similarly, for these new encoding/decoding schemes, we will also show that the optimal compressions 
must be able to support successive compression-message decoding, and any compressions not supporting 
successive decoding will necessarily lead to strictly lower achievable rates than the optimal. Therefore, 
for any of these compress-and-forward relay schemes mentioned above, we can restrict our attention 
to successive compression-message decoding in the search for the optimal compressions of the relays' 
observations^] Of course, it should be noted that any compressions supporting successive decoding also 
support joint decoding. 

Although the compressions supporting successive decoding can be explicitly characterized as we will 
show later, it is also of interest to consider other compressions not supporting successive decoding. For 
example, in a network with multiple destinations, when a relay is simultaneously helping more than one 
destinations, it is very likely that different destinations require different optimal compressions from the 
relay. In such a situation, the relay may have to find a tradeoff between these requirements, i.e., adopting 
a compression which may be too coarse for some destinations, but too fine, thus not supporting successive 
decoding, for the others. An example of this tradeoff to optimize the sum rate was given for the two-way 
relay channel in [Jj. Another possibility of using too coarse or too fine compressions is when there is 
channel uncertainty, e.g., in wireless fading channels, so that it is impossible to accurately determine the 
optimal compressions even with explicit formulas. Therefore, it is of interest to study how coarser or finer 
compressions than the optimal affect the achievable rate of the original message [|9]|. 

It is not surprising that coarser compressions than the optimal do not fully exploit the capability of the 
relay, thus leading to lower achievable rates for the original message. However, it may not be so obvious 
why finer compressions will also lead to lower achievable rates. For this, one needs to realize that a 
relay's observation not only carries information about the original message, but also reflects the dynamics 
of the source-relay link, which is unrelated to the original message. Thus, compared to the direct link 
between the source and the destination, the support by the relay-destination link is not so pure. When 
the compression is too fine so that only joint compression-message decoding can be carried out, i.e., the 
direct source-destination link has to sacrifice, the gain does not make up for the loss. Furthermore, to the 
extreme, when the compression cannot be decoded even with joint decoding, the relay-destination link 
becomes useless, and the destination would rather simply treat the relay's input as purely noise in the 

2 Part of the results were presented in |8|. 
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decoding, as we will demonstrate in the paper. 

The remainder of the paper is organized as the following. In Section |II} we formally state our problem 



setup and summarize the main results. Then, in Section [HI] and Section [IV] detailed proofs of the 
achievability results as well as thorough discussions on the optimal choice of the relays' compressions, 
are presented, under the two different frameworks of block-by-block forward decoding and decoding after 
all the blocks have been finished, respectively. Finally, some concluding remarks are included in Section 

m 

II. Main Results 

Consider the multiple-relay channel depicted in Fig. [2j which can be denoted by 

(X x X x x • • • x X n , p(y 

where, X ,X\, . . . ,X n are the transmitter alphabets of the source and the relays respectively, y, 3-\, • • • , y n 
are the receiver alphabets of the destination and the relays respectively, and a collection of probability 
distributions p(-, - \x, x x , ... , x n ) on yxy x x- ■ -xy n , one for each (x,Xx,..., x n ) G X xX x x - ■ - xX n . 
The interpretation is that x is the input to the channel from the source, y is the output of the channel to 
the destination, and y^ is the output received by the i-th relay. The i-th relay sends an input Xi based on 
what it has received: 

Xi(t) = fi^Viit - l),Vi(t - 2), . . .), for every time t, (5) 

where /»,*(•) can be any causal function. 

Before presenting the main results, we introduce some simplified notations. Denote the set J\f = 
{1,2, ... ,n}, and for any subset S C J\f, let X s = {X^i e S}, and use similar notations for other 
variables. The main results of the paper are presented in the following two different decoding frameworks: 
i) Block-by-block forward decoding; ii) Decoding after all the blocks have been finished, which includes 
all blocks united decoding and block-by-block backward decoding. 

A. Block-by-block forward decoding 

Under the block-by-block forward decoding framework, the achievable rate with successive compression- 
message decoding and the achievable rate with joint compression-message decoding are presented in 



Theorems |2.1| and |2.2| respectively. Then the optimality of successive decoding is stated in Theorem |2.3[ 
and it is shown that the optimal rate can be achieved only if the compressions at the relays are chosen 
such that they can be first decoded at the destination, i.e., successive compression-message decoding can 
be carried out. All the related proofs are presented in Section |lTT| 

Theorem 2.1: For the multiple-relay channel depicted in FigT[2} by the cumulative encoding/block-by- 
block forward decoding/compression-message successive decoding scheme, a rate Rq/f/s is achievable if 
for some 

p(x)p(x 1 ) ■ ■ ■p{x n )p{y 1 \y 1 ,x l ) ■ ■ ■ p(y n \y n ,x n ), 
there exists a rate vector {Ri,i — 1, . . . ,n} satisfying 

Y,Ri<nX Sl ;Y\X S c) (6) 

for any subset Si C J\f, such that for any subset S C J\f, 

I(Y s ;Y s \Y Sc ,Y,X x )<J2Ri (7) 

ies 

and 

i?c/F/s</(X;f A r,r|X A r). (8) 
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Theorem 2.2: For the multiple-relay channel depicted in Fig. [2} by the cumulative encoding/block-by- 
block forward decoding/compression-message joint decoding scheme, a rate -Rc/f/j is achievable if for 
some 

p(x)p(x 1 ) ■ ■ -p(x n )p(yi|i/i,xi) • ■■p(y n \y n ,Xn), 

there exists a rate vector {Ri,i — 1, . . . ,n} satisfying 

J2Ri<nX Sl ;Y\X S c) (9) 

for any subset S\ C J\f, such that for any subset S C A/", 

R cm < I(X;Y M ,Y\X M ) - I(Y s ;Y s \Y S cXXm) + ( 10 ) 

ie5 



Let i?c /F/s and i?£/F/j be the supremum of the achievable rates stated in Theorems 2.1 and 2.2 respectively. 
Theorem 2.3: i?c/F/s = -^c/f/j> an d -^c/f/j can be obtained only when the distribution 

p(x)p(x 1 ) ■ ■■p(x n )p(y 1 \y 1 ,x 1 ) ■ ■ ■ p(y n \y n , x n ) 

is chosen such that there exists a rate vector {Ri, i = 1, . . . , n} satisfying ([6])-([7]). 



B. Decoding after all the blocks have been finished 

It was shown in [3l that the original cumulative encoding/block-by-block forward decoding/compression- 
message successive decoding scheme developed in can be improved to achieve higher rates in the case 
of multiple relays, although no improvement was obtained in the case of a single relay. In their new 
compress-and-forward relay scheme [0, cumulative encoding was replaced by repetitive encoding, and 
block-by-block forward decoding was replaced by all blocks united decoding. They also used joint instead 
of successive compression-message decoding. For the single-source multiple -relay channel depicted in Fig. 
[2j their Theorem 1 in fl3]| can be re-stated as the following theorem. 

Theorem 2.4: For the multiple -relay channel depicted in Fig. [2} a rate -Rr/u/j is achievable if there exists 
some 

p(x)p(x 1 ) ■ ■ -p(x n )p(yi|?/i,xi) • ■■p{y n \y n ,Xn), 

such that 

Rwn < mmI(X,X s ;Y S c,Y\X Sa ) - I(Y S ;Y S \X, X M ,Y,Y sa ). (11) 

o LTV 

In this paper, we will show that the improvement is not a result of replacing cumulative encoding by 
repetitive encoding, but actually, is a benefit obtained when the decoding is delayed, i.e., only starts after 
all the blocks have been finished. Besides all blocks united decoding, we will show that block-by-block 
backward decoding also achieves the same improvement since it also starts the decoding after all the 
blocks have been finished. 

Similar to the framework of block-by-block forward decoding, we will also show that for these 
new schemes with decoding after all the blocks have been finished, the optimal rate can be achieved 
only when the compressions at the relays are chosen such that successive compression-message de- 
coding can be carried out. Thus, in terms of complexity, cumulative encoding/block-by-block backward 
decoding/compression-message successive decoding is the simplest choice in achieving the highest rate 
in the case of multiple relays. The corresponding achievable rate is presented in the following theorem. 

Theorem 2.5: For the multiple-relay channel depicted in Fig. [2], a rate Rq/b/s is achievable if there exists 
some 

p{x)p{xx) ■ ■ -p(x n )p(yi|?/i,xi) • ■■p(y n \y n ,x n ), 
such that for any subset S C J\f, 

I(X s -Y s .,Y\X S c) - I(Y s -Y s \X«,Y,Y S e) > 0, (12) 
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and 

R C/B/S <I(X;Y X ,Y\X X ). (13) 



Let -R^/u/j and Rq/b/s ^ e tne supremum of the achievable rates stated in Theorem 2.4 and 2.5 respectively, 
i.e., 



and 



Rhun--= , max mm vI(X, X s ; Y s ., Y \X S c) - I(Y S ; Y S \X, X N , Y, Y s . 



^c/b/s := , _ max J(X; *V, F|AV) 



such that 7(X 5 ; 3^, Y\X S .) - I{Y S ; Y s \X Ml Y, Y S c) > 0, VS C A/". (14) 

The optimality of successive decoding is demonstrated in the following theorem. 
Theorem 2.6: R^/j = -R£ /B/s , and R^/j can be obtained only when the distribution 

p{x)p(x x ) ■ ■■p{x n )p{y 1 \y 1 ,x 1 ) ■ ■ ■ p(y n \y n , x n ) 



is chosen such that (14) holds. 



As mentioned in the Introduction, although the optimal rate is achieved only when successive decoding 
can be supported, there are situations where it is of interest to consider other compressions not supporting 
successive decoding. Hence, more generally, we will use the cumulative encoding/block-by-block back- 
ward decoding/compression-message joint decoding. The corresponding achievable rate is given in the 
following theorem. 

Theorem 2.7: For the multiple-relay channel depicted in Fig. [2| with a given distribution 

p(x)p(x 1 ) ■ ■ -p(x n )p(yi|?/i,xi) • ■■p(y n \Vn,x n ), 

a rate Rq/b/i is achievable if 

Ram < minI(X,X s -Y vAS ,Y\X vA s) - I(Y s ;%\X,X Vv YSv A s), (15) 
scv s 

where V] is the unique largest subset of Af satisfying 

I(X s -Y vAS ,Y\X,X vA s) - I(Y S ;Y S \X,X V] ,Y,Y VAS ) > 0, (16) 

for any nonempty S C V } . In addition, Yj> s can be decoded jointly with X. 
There also exists a unique largest subset T>\ C Af satisfying 

I(X s ;Y vAs ,Y\X,X vA s) -I(Y S] Y S \X,X V ,,Y,Y V ,\ S ) > 0, (17) 



for any S C T>\. It will be clear from the proof of Theorem 2.7 that the compressions of the relays in 
Af \V>\ are not decodable even jointly with the message. 



On the other hand, the achievable rate (11) can be more generally expressed as 



Rjuun < mxnI{X,X s ;Y M \s,Y\X M \ S ) - I{Ys;Y s \X,X M ,Y,Y MX s) (18) 

if we only consider a subset of relays Ai C Af for the decoding, while treating the other inputs as 
purely noise. Interestingly, the following theorem implies that Ai = Af may not be the optimal choice 



to maximize the R.H.S. (right-hand-side) of (18), i.e., sometimes, it is better to consider only a subset of 
relays. 

Theorem 2.8: For any p(x) 11"= i P{ x i)p{yi\ x i-> Vi)> among all the choices of Ai C Af, the R.H.S. of 



(18) is maximized when Ai = V] or Ai = X>j, but is strictly less than the maximum when Ai ^ T>\. 



Here, Vj and T>\ are defined as in (16) and (117 
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Therefore, not only the compressions of the relays in M \ T>\ are not decodable, but also including 
them in the formula (18), i.e., choosing M. ^ T>\, will even strictly lower the achievable rate. 

By comparing ( 15 ) and ( 18 ) with M. = V], Theorem 2.8 also implies that for any compressions chosen 
at the relays, the cumulative encoding/block-by-block backward decoding/compression-message joint 
decoding scheme achieves the same rate as the repetitive encoding/all blocks united decoding/compression- 
message joint decoding scheme. 



The proofs of Theorems 2.5 2.8 are presented in Section IV 



III. Block-by-Block Forward Decoding 



We first prove the achievability results stated in Theorems 2.1 and 2.2 respectively. 

In both the cumulative encoding/block-by-block forward decoding/compression-message successive 
decoding and the cumulative encoding/block-by-block forward decoding/compression-message joint de- 
coding schemes, the codebook generation and encoding processes are exactly the same as the classical 
way, i.e., the way in the proof of Theorem 6 of 0. The difference between these two schemes is only 
on the decoding process at the destination: i) In successive decoding, the destination first finds, from the 
specific bins sent by the relays via Xi, X 2 , . . . , X n , the unique combination of Y±, Y 2 , . . . ,Y n sequences 
that is jointly typical with the Y sequence received, and then finds the unique X sequence that is jointly 
typical with the Y sequence received, and also with the previously recovered Yi, Y 2 , . . . , Y n sequences, ii) 
In joint decoding, the destination finds the unique X sequence that is jointly typical with the Y sequence 
received, and also with some combination of Yi, Y 2 , . . . , Y n sequences from the specific bins sent by the 
relays via X 1 ,X 2 , . . . ,X n . 



A. A simplified model and proof of Theorem 2. 1 



To make the presentation easier to follow, we introduce a simplified channel model as depicted in Fig. |3j 
where, the relays are connected to the destination via error-free digital links with capacities Ri, R 2 , . . . , R n , 
where (Ri, R 2 , . . . , R n ) are chosen based on ([6]). The i-th digital link plays the same role as the X, ; — > Y 
link in Fig. [2j for any i = 1,2, ... ,n. Such a replacement will not lead to any essential variation of 
the original coding scheme, since under the original coding framework, the Xi — > Y link is used as a 
separate link to forward digital information. The benefit of directly replacing it by a digital link is that the 
codebook construction for Yi can be simplified, since no Xi needs to be considered. For this simplified 
model, ([7]) and ([8]) simplify to 

I{Y s ;Y s \Y S c,Y)<Y,Ri (19) 



■ies 



and 



Rof/s < I{X-Y N ,Y). 



(20) 




Fig. 3. A simplified multiple-relay model with digital links. 



The basic idea of the compress-and-forward strategy is for the relay to compress its observations into 
some approximations, which can be represented by fewer number of bits, and thus, can be forwarded 
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to the destination. To deal with delay at the relay, block Markov coding was used, where the total time 
is divided into a sequence of blocks of equal length T, and coding is performed block by block. For 
example, each relay compresses its observations of each block at the end of the block, and forwards the 
approximations in the next block. Therefore, to decode the message sent by the source in any block, it is 
not until the end of the next block, has the destination received the help from the relay. 

The encoding process is exactly the same as that in the proof of Theorem 6 of 0. We only emphasize 
that the i-th relay needs to generate 2 T ( I ( Yi > Yi > +e > many % sequences, and randomly throws them into 
2 TRl bins. At the end of each block, the relay finds a % sequence which is jointly typical with the Yi 
sequence it received during the block, and in the next block, informs the destination the index of the bin 
that contains the % sequence. 

The decoding process operates in a successive way. At the end of each block b = 2, 3, . . ., the destination 
first finds, from the bins forwarded by the relays during block b, the unique combination of Yx, Y 2 , . . . , Y n 
sequences that is jointly typical with the Y sequence received, i.e., 

(Ixib - 1), • • ■ ,t n (b - l),Y(b ~ 1)) e A(Y M , Y). (21) 



Error occurs if the true Y_jy(b — 1) does not satisfy (21 ), or a false Y ^{b — 1) satisfies (21 ). According 
to the properties of typical sequences, the true Y_j^(b — 1) satisfies (21) with high probability. 

The probability of a false Y_j^(b — 1) with some false {Y_i(b — 1), i E S} but true {Y_i(b — 1), % G S c } 
being jointly typical with Y_(b — 1) can be upper bounded by 



2 T(H(Y,Y^)+e) 2 -T{H(Y,Y sc )-e) JJ 2 -T{H{Y t )-> ) 



There are Yli^s^^ 1 ^ 1 '^ Ri+t ^ ~ -0 Ys(b — 1) from the bins, thus the probability of finding such 
a false Y_^{b — 1) can be upper bounded by 



2T(H(Y,Y M )+e)2-T(H(Y,Y S c)-e) JJ 2 - T ( H & 



)-I{Yi;Y,)+R,-2e) 



which tends to zero for sufficiently small e as T — > oo, if 

H(Y S \Y, Y S c) - Y^[Hfc\Yi) + Ri] < 0. (22) 

ies 

Leting S = {ij G M : j = 1, . . . , |<S|}, we have 

ies j=i,...,|5| 

3=1,~.,\S\ 

=H{Y S \Y S) Y,Y SC ). 



Plugging this into (|22j), we obtain \\9 

Given that (19) is satisfied, the destination can recove r Y y (b — 1) at the end of block b. Then, based 
on Y_j^(b — 1) and Yjb — 1), X_{w) can be recovered if (20) holds. 



The case of "=" can be included since d20p doesn't include "=", The same consideration applies throughout the paper. 
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B. Proof of Theorem 2.2 



Similarly, we consider the simplified model as depicted in Fig. [3j where the rates (R%, R2, ■ ■ ■ , R n ) are 
chosen based on (|9]). Then, ( [10] ) simplifies to 

R cm < I(X; Yjsr, Y) - I(Y S ; Y s \Y S c,Y) + ^ R t . (23) 

ieS 

In cumulative encoding/block-by-block forward decoding/compression-message joint decoding, the en- 



coding part is exactly the same as that in the proof of Theorem |2.1[ and the decoding process operates 
as the following. At the end of each block 6 = 2,3,..., the destination finds the unique X sequence that 
is jointly typical with the Y sequence received during block b — 1, and also with some Y,Y 2 , . . . ,Y n 
sequences from the bins forwarded by the relays during block b, i.e., 

(X(w),Y(b- -1)) G A e (X,Y,Y M ). (24) 



Error occurs if the true X_(w) does not satisfy (24), or a false X_(w') satisfies (24). According to the 



properties of typical sequences, the true X[w) satisfies (24) with high probability. 

The probability of a false Xjw') being jointly typical with y (6 — 1) and some false {Y_i(b -1), 
but true {Y_i(b — 1), i G S c } can be upper bounded by 



2 T(H(X,Y,Y A f)+e)2-T(H(X)-t)2-T(H(Y,Y S c)-e) JJ 2 - T ( H & 



)-*). 

ieS 



There are 2 TR - 1 false w', and r]. g5 (2 T ( / ( y iA)-^+ e ) - 1) false Y s (b - 1) from the bins, thus the 
probability of finding such a false Xjw') can be upper bounded by 

2TR2T(H(X,Y,Yj^)+e)2-T(H(X)-e) 
x 2- T (H(Y,Y S c)-e) TT2-T(fT(y i )-7(y i ;y i )+-Ri-2e) ) 

ies 



which tends to zero for sufficiently small e as T — > 00, if (23) holds 



C. Optimality of successive decoding in block-by-block forward decoding 



To make the proof of Theorem 2.3 easier to follow, we still consider the simplified model depicted in 



Fig. [3} Then, R* c/F/S and i?c/F/j can be respectively written as 



R*cms = , max I(X;Y„,Y) (25) 

P( x ) Ui=\ P(Vi\Vi) 

such that I(Y S ; Y s \Y S c, Y) - Ri < 0, V<S C A/", (26) 

and 

i?*™ = max mm{/(X; Y^, K) - I(Y S ; Y S \Y S ., Y) + £ J*}- (27) 



Before proceeding to the proof of Theorem 2.3 we first introduce some useful notations and lemmas. 
Let 



Ia,b{S) :^J2 R ^- I( - Y ^s\Y A ,Y B \ S ,Y),yS C B, (28) 
ieS 

US) :=/ ,b(5) = ^i2i-/(y 5 ; y 5 |y B \ 5 ,y),V5 C 5, (29) 



ies 
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Then, we have the following lemmas, whose proofs are given in Appendix |A| 

Lemma 3,1: 1) If I A (Si) > 0, V«Si C A, and I B (S 2 ) > 0, VS 2 C B, then I AuB (S) >0,VSCAUB. 
2) If I A (St) > 0, V«Si C A, and J AB (5 2 ) > 0, VS 2 C S, then I AuB {S) > 0, VS C A U £. 
Lemma 3.2: Under any p(sc) rKLiK&bi)' there exists a unique set D, which is the largest subset of 



M satisfying 



Iv(S) > 0,WS C P. 



Lemma 3.3: If I A) b{B) > for some nonempty then there exists some nonempty C C B such that 

7 A c(<S) >0,V5CC. 

Lemma 3.4: For any .4. and B with .4 n B = 0, 7(*4) + 1(B) = I (A U6) + /(Y*; Y B \Y {AJB y, Y). 



We are now ready to prove Theorem 2.3 



Proof of Theorem 2.3- We show -Rq/f/s = -^c/f/j ^y showing that -Rq/f/s — -^c/f/j an ^ -Rc/f/s ^ ^c/f/j 
respectively. Under any p(x) Yl^iPiVilVi) sucn tnat ^O's! Y s \Y S c, Y) < Ylizs R*> ^ — we have 



min{7(X; Y„, Y) - 7(Y 5 ; Y s \Y S o, Y) 



Y f R i } = I(X;Y hr ,Y), 



and thus i?* /F/s < R* c/F/J . 

To show i?c/F/s ^ -^c/f/J' ^ * s sufficient to show that R£/f/j 



can be achieved only with p(x) YYl =1 p(y~i\yi) 
such that 7(<S) > 0, V<S C J\f. We will show this by two steps as follows: i) We first show that under 
any p(x) ]V =1 p(y~i\yi), if V c ^ 0, then V c G argmin7(<S) and ^\ T ^ rom \ nil ^T = V c , where V is 

SCJV" 

defined as in Lemma |3.2 and argmin7(iS>) 

SC.N 

4=iP(yi\yi)> must be 0, i.e., V must be J\f, and thus by the definition of V, 



ireargmin/( l s) 

SCAf 



{T CAT : I(T) = min.scAr/l^)}. ii) We then argue that, 



V c . 



under the optimal p(x)Y 
/(5)>0,V5CA^. 

i) Assuming V c ^ throughout Part i), we show V c G argmin7(iS>) and Drear2min/(5~) ^~ 

1) We first show I(V C ) < by using a contradiction argument. Suppose 7(D C ) > 0, i.e., Ix>,v c (D c ) > 0. 
Then, by Lemma 3.3 we have that there exists some nonempty B CV C such that I-d,b{S) > 0, V<S C B. 
This will further imply, by Part 2) of Lemma |3T] that It>ub(<S) > 0,V<S C X> U i3. This is contradictory 
with the definition of V, and thus I(V C ) < 0. 

2) We show that WA C £> c and .A ^ 7? c , 7(A) > 7(£> c ), and thus 7(A) > min 5 cAf 7(«S). The proof 
is still by contradiction. Suppose that there exists some iCP c and A ^ V c such that 7(A) < I{V C ). 
Then 7(£> c ) - 1(A) > 0, i.e., 



^ 77, - I(Y VC ; Y V c \%, Y)-J2Ri + H Y A', Y A \Y A c, Y) 

\Yv,Y) 



iev c \ A 

--Ivpc\ A {V c \A) 



v c \ A ', Yj}c\ A \ 



>0. 



Again by Lemma 3.3 and 3.1 successively, we can conclude that there exists some nonempty B C T> C \A, 
such that Tzxjg(iS) > 0, V«S C DUB, which is in contradiction. Therefore, 7(^4) > I(V C ) > min^cA^ 7(5). 

3) We prove that WA with AD ^ and AV C ^ V c , 1(A) > min lS c^7(5). Let A x = AD and 
A2 = AD C . Then, we have, by Lemma 3.4, that 

1(A) =I(At U A 2 ) = 7(A) + I(A 2 ) - I(Y Al ;Y A2 \Y A .,Y), 
I(Ai U V c ) =7(A) + IiV c ) - I(Y Al ;Y vc \Y (AlUV c )c , Y). 
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Since I(A 2 ) > I(V C ) by 2) and 

I(Y Al ;Y vc \Y (Al[jVc)c ,Y) 

=i(Xai ; Yt>c\a 2 I Y{Ai u v c y ' ^) + i(Yai ; Xa 2 I Y(.ai u » c ) c > ^°\^ 2 > y) 
=i(y Ai ;Y m \y A c, y) + /(y^jy^jy^u^c, Y) 

>I{Y Al ;Y A2 \Y Ac ,Y), 

we have > I{Ay U £> c ) > min^Af 1(5). 

4) We prove that VA with .A£> ^ and AV C = V c , 1(A) > I(V C ). Letting A x = AV, we have 

1(A) =I(Ax U V c ) 

=I(A X ) + J(X? C ) - I(Y Al ;Y vc \Y iAlUVC)c , Y) 

= J2 R *~ I(Yai;Yai\Yai,Y) - I(Y Al ;Y vc \Y {AlUVC)c , Y) + I(V C ) 
ieAi 

= Y, R *~ HYa^, y Ai \y {AiUVC)c , y) + i(v c ) 

ieAi 

= J2 R *~ HYm;Y Ai \y V \ Ai ,y) + i(v c ) 

ieAi 

=I v (Ax) + I(V C ) 
>I(V C ). 

Combining 2)-4), we can conclude that V c G argmin/(5) and Dreargmin/(5) T = 

ii) We now argue that under the optimal p(x) n"=i PiVilVi) tnat achieves R* cm , if T> c 0, then R* clvn * s 
not optimal; and hence V c must be 0. The argument is extended from that in [7] and the detailed analysis 
is as follows. 

Suppose V c 7^ at the optimum. Then, V c G argmin/(5) and flrea^mm/rsi ^~ = Therefore, 

R* c/Fn =I(X;Y M ,Y) + I(V c ) 

=I(X; Y v , Y) + /(X; %*{%, Y) + ^ R, - I(X, Yd*; Y Vc \Y v , Y) 

=I(X; Yd, Y) + J2^- ^0^5 ^> ( 31 ) 

and similarly, 

^ =J(X; Yv, y) + J(T) 

=/(X; f rc , y ) + ^ -Rj — I(Y T ; Y T \X, Y rc , Y), (32) 

for any T G argmin/(5), T 7^ X> c . 

SCAT 

We argue that higher rate can be achieved. Consider Y{, Y 2 ', . . . , Y^, where Y- = Y for any i G V, 
and y/ = y with probability p and Y/ = with probability 1 — p for any i G V c . When v = 1, the 
achievable rate with Y{, Y 2 ', . . . ,Y£ is R^m/j. As p decreases from 1, it can be seen from pl\ and (32) 
that both /(X; Y^, Y) + I(V C ) and /(X; Y^, Y) + I(T) will increase, where T G argrnin 7(<S), T ^ V c . 

SCAT 

Thus, no matter how I(X;YJj-,Y) + I(S) will change as p decreases for S £ argmin/(5), it is certain 

scsf 

that there exists a p* such that the achievable rate by using Y[, Y%, . . . , Y^ is larger than -Rc/f/j- This is 
in contradiction with the optimality of -Rc/f/j » an( ^ tnus at me optimum, V c must be , i.e., I(S) > 0, 



V«S C M . This completes the proof of Theorem [23 
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IV. Decoding After All Blocks Have Been Finished 

In this section, our discussion transfers to the compress-and-forward schemes with decoding after 
all blocks have been finished. The focus here is on the cumulative encoding/block-by-block backward 
decoding, since it is the simplest scheme to achieve the highest rate in the general multiple-relay channel, 
as mentioned before; for the repetitive encoding/all blocks united decoding, see the proof of Theorem 1 
in 0. 

Cumulative encoding/block-by-block backward decoding can be combined with either compression- 
message successive decoding or compression-message joint decoding. In the following, we will first present 
the cumulative encoding/block-by-block backward decoding/compression-message successive decoding 
scheme to establish the achievable rate in Theorem 2.5[ and demonstrate the optimality of succes- 



sive decoding in the sense of Theorem |2.6| Then, the cumulative encoding/block-by-block backward 



decoding/compression-message joint decoding scheme will be used to prove Theorem |2 .7 [ and the neces- 
sity of joint decodablity is demonstrated in the sense that only those relay nodes, whose compressions 
can be eventually decoded by joint decoding, are helpful to the decoding of the original message. 



A. Cumulative encoding/block-by-block backward decoding/compression-message successive decoding and 
Optimality of successive decoding 

In cumulative encoding/block-by-block backward decoding, the encoding process is similar to that in 
the proof of Theorem 6 in (except that the binning at the relay is not needed here), but the decoding 
process operates backwardly. This scheme, combined with compression-message successive decoding, 
proves Theorem 23] as follows. 



Proof of Theorem 2.5 



Codebook Generation: Fix p(x) Yl™ =1 p(xi)p(ili\xi,yi)- Consider B + M blocks, where the source will 
transmit information in the first B blocks and keep silent in the last M blocks, and M <C B such that 
the rate loss can be made arbitrarily small. We randomly and independently generate a codebook for each 
block. 

For each block b 6 [1 : B + M], randomly and independently generate 2 TRc/B/s sequences x&(ra&), 
m b E [1 : 2 TRciBIS ]; for each block b E [1 : B+M) and each relay node % E J\f, randomly and independently 
generate 2 TRi sequences ~x.i t b{h,b-i), h,b-i £ [1 : 2 TRi ], where Ri = 7(1$; Yj|Xj) + e; for each relay node 
i E N and each x i b (/j fo _ 1 ), l^-x £ [1 : 2 TRi ], randomly and conditionally independently generate 2 TRi 
sequences yi t b(li, b \k,b-i), k,b £ [1 : 2 TRi \. This defines the codebook for any block b E [1 : B + M], 

C b = {M™b),Xi,b(kb-i)Ji,b(k,b\k,b-i) :m b e[l: 2 TR ™%l ijb ,k ib -i e [1 : 2 TRi ], i E A/"}. 

Encoding: Let m = (mi, m 2 , . . . , mg) be the message vector to be sent and let mj = 1 be the dummy 
message for any be [B + 1 : B+M]. For any block b E [1 : B+M], each relay node i E A/", upon receiving 
y ijb at the end of block b, finds an index l ijb such that (x ii6 (Z ii6 _i),y i)6 ,y i>6 (Z ii6 |Z ii6 _ 1 )) E A £ (X h Y h Y t ), 
where l ij0 = 1 by convention. The codewords x b (m b ) and x ij6 (Z i) 6_i), i E Af are transmitted in block b, 
bE[l:B + M]. 

Decoding: i) The destination first finds a unique combination of the relays' compression indices \ B = 
(li, . . . , l B ) and some lf+f = (1 B+1 , . . . , I b +m)> where \ b = (Z 1>6 , . . . , l n>b ), V6 E [1 : B + M], such that 
for any 6= 1,...,B + M, 

( 0^-l,b[h,b-l) , ^l,b{h,b\h,b-l)), ■ ■ ■ , (X-n,b(ln,b-l) j Y n lb {ln )b \l"n ,6—1 

)),Y 6 ) EA e (X M ,Y^,Y). (33) 

Specifically, this can be done backwards as follows: 

a) The destination finds the unique l B such that there exists some lf^* f = (1b+i, ■ ■ ■ , Ib+m) satisfying 



( [33] ) for any b — B + 1, ...,B + M. 

Assume the true lf +A/ = Then, error occurs if \ B = 1 does not satisfy (33) with any for 

any b = B + 1, . . . , B + M, or a false \ B ^ 1 satisfies ((33} with some for any b = B + 1, . . . , B + M. 
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Since 1 



B+M 
B 



M+l 



satisfies (33) for any b = B + l,...,B + M with high probability according to the 



properties of typical sequences, we only need to bound Pr((J lfl , x £\ B ), where S\ B is defined as the event 
that \b satisfies (33) with some lf+f for any 6 = B + 1,...,B + M. For any (l 6 _i,l&), define A b (h-i,h) 
as the event that A}b-i, h) satisfies (33). Then, we have 

B+M 



Pr(U^i B )=Pr(|J U fl Mh-iM) 

l B ^l \B+M 1 B ^1 b=B+l 



'B+l 

M-1 



B+M 



*( U U 



u u n Mh-uh)) 



7 = 1 ,S+M, _-. ,B+M . , /-, 1„^1 £,=i3+l 

Vj 6 [1 : M — 1] 



M-1 



B+M 



B+M 



<E pr ( u u n A(i 6 -i,i,))+pr( u u n ^-i^)). 



i=l lf+f :1 b+j .=i1b^16=B+1 



If +f : 1 S+J # 1, 1b#1 b=B+l 

Vj G [1 : M - 1] 



(34) 



Let us first consider the second term in (34). For any \ B + , let St,(l B 



lB+M\ 



B+M\ 
B 



{ieAf: li,b-i ^ !}• Note 



only depends on l 6 _ l5 so we also write it as Sb(h-i)- Define ~X b (<Sb(h-i)) as {Xi t b(h,b-i) , i G 



S b (\ b -i)}, and similarly define Y 6 («S 6 (l 6 _i)) and Y 6 («S 6 (l 6 _i)). Then, (X 6 («S 6 (1 6 _ 1 )), Y 6 («S 6 (l 6 _i))) is 
independent of (X&(«S£(l&_i)), Y b (S^(\b-i)), Y b ), and Pr(Ab(h-i,h)) can be upper bounded by 

2 T{H{X u ,Y M Y)+e) 2 -T{H{X SCb(h ^ ) ,Y s . {lb ^^ 
= . 2 -T(I(5(,(l i) _ 1 ))-e') 



whereJ(S 6 (1^0) = A**<i»-i);^^^^ 
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and e' — >■ as e — >■ 0. Then, we have 

B+M 

pr( u u n A(i b -i,i6)) 

Vj G [1 : M - 1] 

B+A/ 

Vj e [1 : M - 1] 

B+M 

< ^ En 2~ T(i{iSb(i6 - i)) - e ' ) 

Ib+m lf+f - 1 : l s+j jt 1, l s ^lf>=B+l 

Vj £ [1 : M - 1] 

B+M 

<^2Y1 E n 2- T(i(5b(i6 - i)) - e ' ) 

Ib+m S b+1 ,. . .,S b +m ■■ iB+M-l . = Sb> b=B+l 

S B +j + 0, Vj e [1 : M] Vb e [B + 1 : B + M] 

B+M B+M 

< ^ n 2 T(Eies 6 (/(yi; ^ |Xi)+e)) Yl 2~ T{i{Sb) - e,) 

1b+m 5_b+i, ■ • • ,S b +m '■ b=B+l b=B+l 
S B+3 ^%,Mj e [1 : M] 

B+M 

< y ^ n 2" T(/(xsb; %' y|X5 6 )_/{ys&;tsbi ^' y '% ) " £ '' ) 

1s+m Ss+i,...,5b+m : 6=B+1 
S s+J ^0,Vje [1:M] 

Ifl+Af Ss +1 ,...,5s_(_JVf ! 

5 B+J ^0,Vje[l:M] 

< ( 2 n ) M 2 _TM(mins ^ :5 ^ {/(x ^ 

1b+m 

<2 nE ieA ^(^l^)+ e ))2 nM 2~ TM{ ^ 



where e" — > as e — >■ 0. Thus, as both T and M go to infinity, the second term in (34) goes to 0, if (12) 
holds. 

Now consider the first term in (34). For any j e [1 : M — 1], we have 

B+M B+j 

pr ( u u n Mh-i,h))<H u u p| ^o^y). 

lf+f :l fl+i =l U#l b=B+l lfl^:lfl+J=l 6=B+1 

Note Pr(Ui s +M B+ =i Ui s ^i n^B+i -4&(lb_i, lb)) is the probability that there exists a false \ B ^1 satisfies 

(33 ) with some 1^{ for any block b G [5 + 1 : -B+j], where \ B+ j = 1 is true. We can show this probability 
goes to with the idea of backward decoding as follows. 

Specifically, backwards and sequentially from block b = B + j to block 6 = 5 + 1, the destination 
finds the unique l 6 _x, such that (l b _ 1 ,l b ) satisfies (33), where \ b has already been recovered due to the 
backwards property of decoding. At each block b = B + j, B + j — 1, . . . , 5 + 1, error occurs if the true 
lf,_i does not satisfy ( [33]) , or a false satisfies (33). According to the properties of typical sequences, 
the true satisfies p% with high probability. 



15 



For a false with false {Zi,b_i,z G S} but true {kfi-i,i G «S°| , (X b (<S), Y fe (<S)) is independent of 
(X b (5 c ), Y fe (5 c ), Y b ), and the probability that satisfies (33) can be upper bounded by 



2 T{H{X u Suy)+e) 2 -T(H{X sc X sc y)-^ 2 -T{H{X s )-e) 2 -T(j: teS {H(Y l \X i )-e))^ 

Since the number of such false \ b _i is upper bounded by Yli&s 2 T( - I ^ Yl ' Yi ^ x ^ +t \ with the union bound, it 
is easy to check that the probability of finding such a false goes to zero as T — > oo, if ([12]) holds. 



Therefore, if (12) holds, the first term in (34) also goes to as T — > oo, and \ B can be decoded. 



b) Given that \ B has been recovered, the destination performs the backward decoding similar with above. 
That is, backwards and sequentially from block b = B to block 6 = 2, the destination finds the unique 
lb-i, such that satisfies (33), where I5 has already been recovered. From the above analysis, it 

follows that at each block b = B, B — 1, . . . , 2, the probability of decoding error goes to zero as T — > 00, 
if (12) holds. This combined with a) implies that I s can be decoded, if (12) holds. 

,B, 



ii) Then, based on the recovered I s , the destination finds the unique m such that for any b = 1, 

Xfc(m 6 ), (X 1 ; ) (/ ljfe _ 1 ), Y lib (Z li6 |Z 1& _ 1 )), . . . , (X„, )6 (Z„ )6 _i) )),Y b ) eA e (X,X N ,Y N ,Y). 



(35) 



Obviously, the probability of decoding error will tend to zero if -Rc/b/s < Y^, Y\Xj^). 



We are now in a position to prove Theorem 2.6 To facilitate the proof, we introduce some notations 
and lemmas. Let 



Jab{S) 
MS) 

m 



--I(X s ;Y b \ s ,Ya,Y\X a ,X bxs ) - I(Y S ;Y S \X A ,Y A ,Y,X B ,Y B \ S ),VS C B, (36) 
--\ B {S) = I(X S ;Y B \ S ,Y\X B \ S ) - I(Y s ;Y s \X b ,Yb\s,Y),VS C B, (37) 
--MS) = I(X S ] Y sc , Y\X Sc ) - I(Y S - Y S \X M , Y, Y sc ),VS C M. (38) 



Then, we have the following lemmas, whose proofs will be presented in Appendix |B} 

Lemma 4.1: 1) If J A {Si) > 0, V<Si C A, and J B {S 2 ) > 0, VS 2 C B, then J AuB {S) >0,\/SCAuB. 

2) If J A (Sx) > 0, V5i C A, and J A>B (S 2 ) > 0, V5 2 C B, then J AuB (S) >0,VSCAUB. 

Lemma 4.2: Under any p(x) YIi=iP( x i)p(yi\ x ii Vi)> there exists a unique set V, which is the largest 

subset of M satisfying 

MS) > 0,VS C V. 

Lemma 4.3: If J AjB (B) > for some nonempty B, then there exists some nonempty C C B such that 

JAfi(S)>0,^S CC. 

Lemma 4.4: For any .4 and B with .4 n 6 = 0, J (A) + J(£) = J (A U £>) + J(.A o £), where 
J(^4o B) =J(X4, Y a ; Xb, Y b \X( AuB )c, Y( AuB )c, Y) 

=I{Xa\ X b \X( AuB )c, Y"(^ uB ) C , Y) + I(X A ; Y B \X A c, Y( Au b)<=, Y) 
+ I(X B ; Y A \X B c,Y AuB)c , Y) + I(Y A ; Y B \X N , Y AuB) c, Y). 



The proof of Theorem |2.6| is similar to the proof of Theorem |2.3[ and the details are as follows. 

actively written as 

I(X;Y M ,Y\X M ) 



Proof of Theorem 2.6- i?c/B/s an d Rr/u/j can ^ e respectively written as 



C/B/S 



max 

p( x )Tli=iP( x i)p(yt\ x i,yi) 



such that J(S) > 0, VS C J\f , 



(39) 
(40) 



and 



-"-RAJ/J 



max min I(X, X$; Ysc, Y \Xgc 

p( x ) n™=i p( x i)p{yi\ x i^t) s< ^n 

: , m» min{/(X; IV, y 1X^ + 7(5)}. 



I(Y s ;Y s \X,X«,Y,Y S c 



(41) 
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We show R 



C/B/S 



#r/u/j b y showing that R* cm/S < and R* cm/S > R^j respectively. Under any 



p( x ) TYi=iP( x i)p(yi\ x ii Hi) such tnat J(S) > 0, C M, we have 

win{I(X;Y M ,Y\X M ) + J(S)} = I{X;Y N ,Y\X N ), 

o C- Jx 



and thus i?* /B/s < R^ m . 

To show Rq/b/s — -^r/u/J' it is sufficient to show that R^j Vn can be achieved only with the distribution 
p( x ) TYi=iP( x i)p(yi\ x ii Hi) such that J(S) > 0, V5 C J\f. We will show this by two steps as follows: 
i) We first show that under any p(x) Yl™ =1 p(xi)p(yi\xi, yi), if V c ^ 0, then V c G argmin J(5) and 

SCJV 

r 



ireargmin j{S) 



V c , where V is defined as in Lemma 4.2 



{T c AT : J{T) 



and argmin J(5) 

min^cA^ J{S)}. ii) We then argue that, under the optimal p{x) \[™ =1 p(xi)p(yi\xi, yi), V c must be 0, i.e., 
V must be Af, and thus by the definition of V, J(S) > 0, WS C jv\ 

i) Assuming V c throughout Part i), we show X? c G argmin J(S) and flrearsmin jcsi ^~ = 

•SCAT f £Ar 

1) We first show J(V C ) < by using a contradiction argument. Suppose J(V C ) > 0, i.e., Jv,v c {T^ c ) > 0. 
Then, by Lemma 4.3 we have that there exists some nonempty B CV C such that Jd,b(<S) > 0, V*S C 
This will further imply, by Part 2) of Lemma 4.1| that Jdub(<S) > 0,V<S C P U £>. This is contradictory 
with the definition of V, and thus J(V C ) < 0. 

2) We show that C V c and „4 ^ V c , J (A) > J{V C ), and thus J (A) > minscAT J(S). The proof 
is still by contradiction. Suppose that there exists some iCI> c and A ^ V c such that J (A) < J{V C ). 
Then J(V C ) - J (A) > 0, i.e., 



I(X VC ; Y V ,Y\X V ) - I(Y V o; %o\X N , Y, Y v ) - I{X A ; Y A c, Y\X A .) + I{Y A ; Y A \X N , Y, Y A .) 
-J(X V o XA ; %, Y\Xd) + I(X A ; %, Y\X A c) - I(Y V o\ A ; %c\ A \X N , Y, Y v ) - I(Y A ; Y A \X N , Y, Y A c) 

- I(X A ; Yd, Y\X Ac ) - I(X A ; Y V .\ A \Y V , Y, X A .) + I(Y A ; Y A \X M , Y, Y A c) 
--I(X VC \ A ; Y v , Y\X v ) - H{Yd*\ a \X n , Y, Y v ) + H( Y V o\ A \Y vc \ A , X N , Y, Y v ) 

- H(Y V c\ A \Yd, Y, X a «) + H(Y V c\ A \X Al Yd, Y, X a .) 



=I(Xz>c\a'i Yd, Y\X- 
=Jv,vc\a(V c \A) 
>0. 



v 



Xd, Xd?\a, Y, Yn 



Again by Lemma 4.3 and 4.1 successively, we can conclude that there exists some nonempty B C V C \A, 
such that Jdub(<S) > 0, V«S C DUB, which is in contradiction. Therefore, J(A) > J(V C ) > min^cA^ J(<S). 

3) We prove that VA with AV + and AV C ^ V c , J (A) > J(AUV C ) > min^cA" J(S)- Let A x = AV 
and A2 = AV C . Then, we have, by Lemma |4.4[ that 

J (A) =J{Ax U A 2 ) = J(At) + J(A 2 ) - J{A X o A 2 ), 
J{Ax U V c ) =J{Ai) + J(V C ) - J(Ai o V c ). 

Since J(A 2 ) > J(V C ) by 2), to show J (A) > J(AU V c ) > min^cA^ J($), we only need to show 
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J (Ax o A 2 ) < J (Ax o V c ). Let A 3 = V c \ A 2 . Then, we have 

J (Ax o V c ) - J (Ax o A 2 ) 
=I(Xai', X A2UAs \X( AlUA2UA3 ) C , Y( AlUA2UA3 )c , Y) + I(X Ai ;Y A2UAl3 \X A c,Y( AiUA2UAl3 )c, Y) 

+ I(X A2 uA 3 ] Xai I^(-4 2 u4 3 ) c > Y(AiUA2UA 3 ) c i Y) + I(XAi) Y A2 uA 3 \Xj^, Y(AiUA2i)A 3 ) c > Y) 

- I(X Al ; X A2 \X( AlUA2 )c, Y( A 

1 UA2 ) c ) ^0 ~~ ^POli i Y A2 \X A c , 

iU4 2 ) c ' 

- ^(^ 2 ;XaiI^|> ^(.4iu4 2 ) c , Y) - I(Y Al ;Y A2 \Xj^f, Y( AlUA2 y, Y) 

=I(X Al ; X As \X( Al uA2uA 3 ) c i Y( AiUA2UAl3 )c , Y) + I(X Al ; X A2 ,Y A . 3 \X( AlUA2 )c, Y( AiUA2UAl3 ) C , Y) 
+ I(Xas'i Yai l^(^ 2 u^ 3 ) c ' Y( AiUAl2UA3 )c , Y) + I (Y Al ; X A2 ,Y A , 3 \X A c, Y^ AlVA2UA ^o, Y) 

- I(X Al ;X A2 \X( AlVjA2 y, Y( AlUA2 y, Y) - I(X A2 ;Y Al \X A c, Y( AlUA2 y, Y) 
=I(X Al ; X A3 \X( AlUA2L)A3 y, Y( AiUA2UAl3 )c , Y) + I(X Al ;Y A3 \X( AlUA2 y, Y( AlUA2UA3 )c, Y) 

+ I(Xas', Y Al |X(^ 2U ^ 3 )c, Y( AlUA2UAs )c, Y) + I(Y Al ;Y As \X A c, Y( AlUA2UAs y,Y) 
>0. 

Thus, we have J (A) > J (Ax U V c ) > minscAT J(S). 

4) We prove that MA with AV ± and AV C = V c , J (A) > J(V C ). Letting Ax = AV, we have 

J (A) = J(Ax U V c ) = J(Ax) + J(V C ) - J(Ax o V c ). 

Thus, to show J (A) > J(T> C ), we only need to show J (Ax) — J(Ax ° V c ) > 0. For this, we have 

J(Ax) - J(AxoV c ) 
=I(X Ai ;Y D c,Y- d \ Ai ,Y\Xt,c,X T) \ Ai ) - I(Y Al ;Y Al \Xtf, Y, Y V o, Yx>\ Al ) 

- I(X Al , Y Al ; Xt>c, Y V c \Xt>\Ai j Yv\Ai > ^) 

^(■^Ai;^! *X> C ) ix>\^i, ^|^D\^li) - J(Xai5 Xai I-^at, y ix>=, Y v \ Al ) 

— I (Xai'i X V c ,Y V c\X v \ Al ,Y v \ Al ,Y) — I(Y Ai ;Xt>o, Y V c\X v , Y v \ Al ,Y) 
=I(Xa x ',Yt>\ Ai , Y\X v \ Al ) - I(Y Al ; Ipc, lx,c, Y Al \X V , Y v \ Al , F) 
=Jv(Ax) 

>0, 

and thus J(.A) > J(V C ). 

Combining 2)-4), we can conclude that V c E argmin J(S) and flreargmin j(S) T = 

ii) We now argue that under the optimal p(x) YYi=iP( x i)P(yi\ x ii Vi) tnat achieves -R^/u/j, if T> c ^ 0, then 
R^ju/j is not optimal; and hence V c must be 0. 

Suppose V c ^ at the optimum. Then, V c E argmin J(S) and flreargmin ^~ = Therefore, 

5 CAT sc.M 

R R/U/J 

=/(x, x^ c ; y,, y |x c ) - /(y^; y, c |x, x^, y y?) (42) 
=/(x,x r; y rc ,y|x rc ) -/(y r ;y T |x,j^,y,y Tc ), (43) 

for any T G argmin J(S), T ^ V c . 

SCAT 

We argue that higher rate can be achieved. Consider Y{, Y 2 \ . . . , Y^, where Y( = Yi for any i G V, and 
y/ = y with probability p and Y( = with probability 1—p for any i E V c . When p = 1, the achievable 
rate with Y{, y 2 ', . . . , is -R^/u/j. As p decreases from 1, in (42) and (43), both 

I(X, Xj) C \ Yd, Y\Xd) — I(Ydc] Yd? \X, Xj^, Y, Yd) 
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and 



I(X, X T - Y T c, Y\X TC ) - I(Y T ; Y T \X, X N , Y, Y T . 



will increase, where T G argmin J(S), T ^ V c . Thus, no matter how 

I(X, X s ; Y sc , Y\X SC ) - I(Y S ; Y S \X, X M , Y, %*) 
will change as p decreases for S argmin J(S), it is certain that there exists a p* such that the achievable 

SCAT 

rate by using Y[,Y%, . . . ,Y' n is larger than -Rj^u/j. This is in contradiction with the optimality of -R^/u/j, 
and thus at the optimum, V c must be , i.e., J(S) > 0, V5 C J\f. This completes the proof of Theorem 

m ■ 



B. Cumulative encoding/block-by -block backward decoding/compression-message joint decoding and Ne- 
cessity of joint decodability 

Some notations and lemmas are introduced to facilitate the later discussion. Let 



K A , B {S) 
K B {S) 
R B {S) 



--I(X S ; Y B \ S , Y A ,Y\X, X A , X B \ S ) - I{Y S ; Y S \X, X A , Y A , Y, X B , Y B \ S ),WS C B, 
--K $>B (S) = I(X s ;Y B \ s ,Y\X,X B \s) - I(Y S ;Y S \X,X B ,Y B \ S ,Y),WS C B, 
--I(X,X S ;Y B \ S ,Y\X B \ S ) -I(Xs;Y s \X,X B , Y B \ S ,Y),VS C B. 



Lemma 4.5: 1) If K A (S\) > 0, for any nonempty S\ C A, and K B (S2) > 0, for any nonempty S2 Q B, 
then K AuB (S) > 0, for any nonempty S C A U B. 

2) If K A (Si) > 0, for any nonempty Si C .4,, and -^^^(52) > 0, for any nonempty S 2 Q B, then 
K A \j B {S) > 0, for any nonempty S C AU B. 

Lemma 4.6: Under any p(x) Y["=i p{xi)p{ili\Xi, yi), there exists a unique set Vj, which is the largest 
subset of M satisfying 

K Vs (S) >O,V5CI? J , l S^0. 

Lemma 4.7: If K AjB (B) > for some nonempty B, then there exists some nonempty C C B such that 
K At c{S) > 0, for any nonempty S C C. 

Lemma 4.8: For any disjoint A and B, and any S C A U B, let Si = SA and 5 2 = SB. Then, we 
have: 

1) Raub{S) > R A (Si) + ^ UB («5 2 ). 

2) Specially, when <S 2 = B, R Au b{S) = R A (Si) + K AB {B). 

Lemmas 43] 4.7 can be proved along the same lines as the proofs of Lemmas 4.1 4.3 respectively, 
while the proof of Lemma 4.8 is given in Appendix [C} 

The cumulative encoding/block-by-block backward decoding/compression-message joint decoding scheme 
is presented in the following proof. 



Proof of Theorem 2.7 ■ The uniqueness of Vj has been established in Lemma 4.6 Below, we focus 



on showing that i) the rate in ( |T5[ ) is achievable, and ii) the compressions in the set Uj can be decoded 
jointly with X. 

To make the presentation easier to follow, we first consider the case when V] = J\f, i.e., the case when 



and show that 



I(X S ; Y S c,Y\X, X sc ) - I(Y S ; Y S \X, X x , Y, Y S c) > 0,V5CA/",5 7^ 



Rob/j < mm I{X, X s ; Y S c, Y\X S o) - I(Y S ; Y S \X, X M , Y, Y s . 



(44) 
(45) 



is achievable. The case ofV^^N will follow immediately after the case of Vj = M is treated. 
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Fix p{x) nr=i P{ x i)p{ili\ x ii Vi)- Assume (44) holds. The codebook generation and encoding process 



here are exactly the same as those in the proof of Theorem 2.5 and hence omitted. For the decoding, 
the destination finds the unique message vector m = (mi, m 2 , . . . , m B ) and some \ B+M = (1 1; . . . ; \ B+M ) 
such that for any b — 1, . . . , B + M, 



Xfe(mfe), (y^l,b(h,b-l),'^l,b(h,b\h,b-l)), • • • > (X-n 7 b(l n ,b-l), ^n,b(ln,b\ln,b-l)) , Y; 



where mb = 1 is dummy message for all b G [B + 1 : B + M]. 
Again, this can be done backwardly as follows. 



EA e {X,X N ,Y N ,Y), 
(46) 



a) The destination first finds the unique \ B such that there exists some lf^f 7 



(1 



1 



B+M, 



satisfying (46) for any b — B + 1, . . . , B + M. Through the similar lines as in the proof of Theorem 2.5 
with X&(ra&), b G \B + 1 : B + M) taken into account and treated as known signals, it follows that L3 can 
be decoded if ([44]) holds. 

b) Backwards and sequentially from block b = B to block 6=1, the destination finds the unique pair 
(m b ,h-i), such that (m b ,h-i) satisfies (46), where \ b has already been recovered due to the backwards 
property of decoding. 

At each block b = B, B — 1, . . . , 1, error occurs with nib if the true vtib does not satisfy (46) with any 
l&_i, or a false nib satisfies (46) with some \-\. According to the properties of typical sequences, the true 
(m 6 ,lf,_i) satisfies (46) with high probability. 

For a false m b and a with false {k t b~i, i G S} but true {^,6-1, i G S c }, X fc (m fe ) and (~K b (S), Y b (S)) 
and (Xb(5 c ), Yfe(iS c ), Yb) are mutually independent, and the probability that (mb, l b -i) satisfies (46) can 
be upper bounded by 

2T(^(X,X^,y^,y)+e) 2 -T(/f(X)-e) 2 -T(/f(X sc ,y sc ,y)-e) 2 -T(//(X 5 )-e)2-T(E ieS (^(^l^)-^))_ 

Since the number of such false (m b , l 6 _i) is upper bounded by 2 TRcm/1 Yli^s 2 T ^ I( ^ Yl ' Y ^ x ^ +t \ with the union 



bound, it is easy to check that the probability of finding a false nib goes to zero as T — )► 00, if (45 ) holds. 

Then, based on the recovered nib and again from the proof of Theorem 2.5 with ~Kb(rrib) taken into 
account and treated as known signal, it follows that 1^ ! can be decoded if (44) holds. 

Combining a) and b), we can conclude that both m and \ B can be decoded if both (44) and (45) hold. 



If under p(x) YYi=iP( x i)p(yi\ x i^yi)^ 7^ A/", then through the same line as above with Af replaced by 
T>j, it readily follows that 

Raw < min /(X, X s ; Y Vl \ s ,Y\X V] \ s ) - I(Y S ; Y S \X, X V] , Y, Y V] \ S ) 

is achievable; and Yd,, or more strictly, {If ,i G £>j}, can be decoded jointly with X since 

I(X S] Y VAS ,Y\X,X VAS ) - I(Y S] Y S \X,X V] ,Y,Y VAS ) > 0, 

for any nonempty 5 C Dj. ■ 
Now, we demonstrate that only those relay nodes, whose compressions can be eventually decoded, are 
helpful to the decoding of the original message. 



Proof of Theorem 2.8- The uniqueness of V } has been treated in Lemma 4.6 while the uniqueness 



of Vj can be established along the same lines. To prove Theorem 2.8 in terms of the notations defined 
in this section, we will sequentially prove that: i) max^cAf min^cx Rm($) = mm scx>j Rt>j(S)', ii) 
min 5 c.M Rm{S) < min^c©; Rv' s {S), for any M V\\ iii) max^c^ min^cx Rm{S) = min^cx); Rv^S). 

i) We prove max M cAfmm S cM Rm(S) — mm^cD: Rv, (S) by proving that: 1) For any M. fl V } = 
V], M ^ Vj, mm S cM Rm(S) < mhisc^^jGS). 2) For any M H T> } ^ V h mm S c M Rm(S) < 
min^cxucj Rmuv s (S)> and thus min^cx Rm(S) < ™in-scv } Rv s (S) by !)■ The details are as follows. 

1) Assume M fl V] = Vj, M. 7^ Vj. We show min^cx Rm(<S) < mm S cv, Rv s (<S) by showing that 
for any S C V h R M {S U{M\ £>,)) < R Vl {S). 
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For any S C T>j, by Part 2) of Lemma 478} we have 



U (M \ V,)) = Rvmmw^S U (M \ X?,)) = R V] (S) + K VhM \ Vl {M \ V,). 



We argue Kx> h M\Dj(M.\Vj) < by contradiction. Suppose Ku u m\Dj(-M\Vj) > 0. Then, by Lemma 4.7 
we have that there exists some nonempty C C Ai\V } such that Kz> h c{S) > 0, for any nonempty S C C. 
This will further imply, by Part 2) of Lemma 4.5 that Kx> lUC (S) > 0, for any nonempty S C X?j U C, 
which is in contradiction with the definition of V]. Thus, we must have K Vj m\Vj{M \ ^j) — ^, and 
Rm(<S U (jM \ < i2x>j(<S). 

2) Assume A* n £>j ^ Pj. For any 5 C A4 U X?,, let = «SA4 and S 2 = S{V Z \ M). By Part 1) of 



Lemma |4T8| we have 

Rmuv,(S) = Rmu(p,\m)(S) > Rm{Si) + K Mu d s (S2 

and then, 

cr- m i^ Rmuv^S) > min {i2^(«Si) + ^Mu^Sa)} 



> min 

SCMuDj 



{^(50 + 1^(52)} 



min { J R M («S 1 ) + ^ J («S 2 )} 

= min R M (Si) + min K-dJSz) 
SiCM s 2 cv s \m 



> min R M (Si), 

SiQM 

where the last inequality follows from the fact that K^XSz) > 0, for any nonempty <S 2 C Pj. 

ii) We can prove min^cx Rm(S) < mm 5CDj Rv' s {S), f° r an Y -M £>j by two similar steps as follows. 

1) Through the similar lines as in Step 1) of Part i), we can prove min^cx Rm(S) < Toamscv Rw(S), 
for any Ai fl T>\ = V\, A4 ^ T)\. The only difference is that here the inequality is strict, but it can be 
easily justified by noting that "=" is included in the definition of V\. 

2) From Step 2) of Part i), it can be similarly proved that for any M. fl V\ ^ V\, min 5 cM Rm(S) < 
min 5 cxux>j Rm\jd'{S)- Therefore, if, further, M. V' } , then by 1) we have 

min Rm(S) < min Rmuv'(S) < min Rt>>(S). 
SCM SCMUV' S ' scv s ' 

iii) From Part ii), we have 1) min.sc.M Rm(S) < min^cD' Rw(S), for any M HV'j = V\, M. ^ T>\, and 
2) for any MnV'j^ X>j, min^cx Rm(<S) < mm S cMuv' s Rmud' s {S) < min^co; Rt>' } {S). Thus, it follows 
immediately that mm s CT> > R V /(S) = max M cj>j mm S cM Rm(S)- This completes the proof of Theorem 
I2~8l ■ 

V. Conclusion 

Joint compression-message decoding introduced more freedom in selecting the compressions at the re- 
lays. Motivated by it, we have investigated the problem of finding the optimal compressions in maximizing 
the achievable rate of the original message. We have studied several different compress-and-forward relay 
schemes, and the unanimous conclusion is that the optimal compressions should always support successive 
compression-message decoding. In situations where compressions not supporting successive decoding have 
to be used, we have found that only those that can be jointly decoded are helpful to the decoding of the 
original message. 

We have also developed a backward block-by-block decoding scheme. Compared to the repetitive 
encoding/all blocks united decoding scheme recently proposed in [3], which improved the achievable 
rate in the multiple-relay case, we have realized that the key to the improvement comes from delaying 
the decoding until all the blocks have been finished. In retrospect, the multiple-relay case is different 
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from the single-relay case in that it may take multiple blocks for the relays to help each other before 
their compressions can finally reach the destination. Hence, the block-by-block forward decoding scheme, 
which is sufficient for the single-relay case, may not work satisfactorily for multiple relays in general. 

Finally, we need to point out that our discussion of optimality is restricted to the few selected compress- 
and-forward relay schemes. In generalizing the classical compress-and-forward relay scheme in [2] to the 
case of multiple relays, there could be many other choices of coding considerations iflQl . Even for the 
single-relay case, the optimality of the original compression method used in [0 remains an open question 

an, d). 



Appendix A 
Proofs of Lemmas EUlH^J 



Proof of Lemma 3.1 



For any S C A U B, let 5 X = SA and S 2 = S(B \ A). Then, 

ieS 

= Ri - HYs^Ys, \y (Aub) \ S , Y) + Y J R>- I(Ys 2 ;Y S2 \Y (AuB) \ s , Y Sl ,Y) 



>J2 R *~ I{Y ^ ! Y Sl \Y A \ Sl ,Y) + J2Ri~ HYs 2 ; Y S2 \Y A , Y B \ S , 2 , Y) 
ieSi ies-2 

=I A (Sl) + iAfiifr) 

>Ia{S 1 ) + I b {S 2 ). 



(47) 
(48) 



If Ia(Si) > 0, VSi C A, and I B (S 2 ) > 0, V«S 2 C B, then followin g fig , I AuB (S) > 0, V<S C AuB. If 
Ia(Si) > 0, V5i C A, and I A , B (S 2 ) > 0, V5 2 C B, then following ^TfTl AuB (S) >0,VSCAUB. ■ 
Proof of Lemma Q Let £:= {JCjV: I F (S) > 0,V5 C F} and £ max := {V e £ : \V\ = 
maxjp £ £ | Suppose there are more than one element in £ max , say, V 1 ,V 2 , . . . , V n , where n > 2. Then 



based on 1) of Lemma 3.1 V := UILi T>i also satisfies that Iv{<S) > 0, V<S C V, which is in contradiction, 
and hence Lemma 3.2 is proved. ■ 
Proof of Lemma 3.3- If I AtB (S) > 0, V«S C B, then this lemma obviously holds. Otherwise, if there 
exists some <Si C B, S\ ^ B, such that I A)B (S\) < 0, then we have I A:B (B) — I Ai b(Sx) > 0, i.e., 

Ri - I(Y B ; Y B \Y A , Y)-(j2 R *- I(Y Sl ;Y Sl \Y A , Y B \ Sl ,Y) j 
ieB \ieSi / 

= Ri-nYB\ Sl ;Y BXSl \Y A ,Y) 

ieB\Si 

>0. 

Now, we arrive at the same situation as in the original assumption with B replaced by B\Si. Continue 
applying this argument, and we must be able to reach a nonempty C C B, such that I A; c{S) > 0, V5 C C. 
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Proof of Lemma 3.4- For any disjoint A and B, 

I (A U B) 



iEAuB 

-J2 R >- j ( y aub; Y A \Y {AUBr , Y) + J2 Ri - HYaub; y b \y (Aub)c , Y a , Y) 

i£A i£B 

--J2 R i~ T (Xa, %■ Y a \Y {AuB) c, Y) + J2R*- HYb; Y B \Y (AuBr , Y A , Y) 

i£A i£B 

--^Ri- I(Y B ; Y A \Y {AUB)C , Y) - I(Y A ; Y A \Y A ., Y) + ^ R t - I(Y B ; Y B \ Y B c , Y) 

i£A i£B 

--1(A) + 1(B) - I(Y A] Y B \Y [AuBr , Y), 



which proves the lemma. 



Appendix B 
Proofs of Lemmas I4TTH4 .41 



Proof of Lemma 4. 1 • 
For any S C A U B, let Sx = SA and S 2 = S(B \ A). Then, 

Jaub(S) =I(X s ; F(^ub)\5, ^|^(.4ub)\s) - I(Y S ] Ys\X ALlB , Y(Aub)\s, Y) 

=I(Xs 1 ]Y( AuB )\ s ,Y\X( AuB )\s) + I(X S . 2 ; Y( A ub)\s, Y\X Sl , X( AuB )\ s ) 

- I(Ys! ', Ys 1 \X AuB , Y( AuB )\ s , Y) — I(Y S , 2 ; Y s . 2 \X AuB , Y Sl , Y( AuB )\s, Y) 
=I(X Sl ;Y( AuB )\s,Y\X( AuB )\s) + I(Xs 2 ; Y( AuB )\s, Y\X Sl , X( AuB )\s) 

- [I(Y Sl ;Y Sl \X A ,Y A \ Sl ,Y) - I(Y Sl ; X b \ A ,Ybm\s 2 \X a ,Y A \ Si ,Y)] 

- I(Ys 2 ] Y S2 \X AuB , Y Sl , Y( AuB )\ s , Y) 
—[li^s^Y^us^s, 3^|^(,au6)Vs) - I(Ys!',Y Sl \X A ,Y A \ Sl ,Y)] 

+ [I(Xs 2 '> Y( Au b)\s, Y\X Sl ,X( AuB )\ s ) + I(Y Sl ;X B \ A , Y BA c\s 2 \X A , Y A \ Sl ,Y)} 
~ I\Ys 2 ] Ys 2 \X A , X B , Y A , Y B \ S2 , Y) 
>[I(X Sl ;Y A \ Sl ,Y\X A \ Sl ) - I(Y Sl ;Y Sl \X A , Y A \ Sl ,Y)] 

+ [I(Xs2'i Y( A ub)\s, Y\X Sl , X( AuB -)\ S ) + I(Y Sl ;X B \ A , Y BA c\s 2 \X A , Y A \ Sl ,Y)] 

- I(Ys 2 ] Y S2 \X A , X B , Y A , Y B \ S2 , Y) 

=[I{Xs 2 'i Y( AuB )\s, Y\X Sl , X( AuB )\ S ) + I (Y Sl ; X S2 , X BA c\s 2 ,Y BA c\s 2 \X A , Y A \ Sl ,Y)] 

- I(Y S2 ; Y S2 \X A , X B , Y A , Y B \ S2 , Y) + J A (S,) 

> [I(Xs 2 , Y( AuB )\ s , Y\X A , X B \s 2 ) + I(Y Sl ; X S2 \X A , X BA c\ &2 , Y BA c\ S2 , Y A \ Sl ,Y)] 

- I(Y S2 ; Y S2 \X A , X B , Y A , Y B \ S2 , Y) + J A (S,) 

=I(X S2 ;Y A , Y B \s 2 ,Y\X A , X B \ S2 ) - I(Y S2 ; Y Sz \X A , X B , Y A , Y B \ S , 2 , Y) + J A (S,) 
=Ja(Si) + Ja,b(^) (49) 
>J A {Si) + Jb(S 2 )- (50) 

If J A (S X ) > 0, V5i C A, and J B (S 2 ) > 0, VS 2 C B, then following gob, J AuB (S) > 0, V5 C A U B. 
If J A (Si) > 0, V5i C A, and J A)B (S 2 ) > 0, VS 2 Q B, then following (g9f, J AuB (S) > 0, V5 C A U B. 
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Proof of Lemma [Of Let C : = {J 7 C N : J T {S) > 0,V<S C J 7 } and £ max := {X? G £ : |X?| = 
maxjp e £ (J 7 )}. Suppose there are more than one elements in £ max , say, T>i,V 2 , . . . ,T> n , where n > 2. 
Then based on 1) of Lemma |4.1[ V := [J™ =l T>i also satisfies that Jv(S) > 0,V5 C V, which is in 



contradiction, and hence Lemma 4.2 is proved. ■ 
Proof of Lemma 4.3- If J A)B (S) > 0, V5 C B, then this lemma obviously holds. Otherwise, if there 
exists some <Si C B, S\ ^ B, such that J A}B (Si) < 0, then we have J AjB (B) — J A , B {Sx) > 0, i.e., 

I(X B ; Y A , Y\X A ) - I(Y B - Y B \X A , Y A , Y, X B ) 

- I(X Si ;Y b \ Si ,Ya, Y\X a , X B \ Sl ) + I(Y Sl ;Y Sl \X A , Y A , Y, X B , Y B \ Sl ) 
=I(X B \ Sl ;Y A , Y\X A ) + I(X Sl ;Y A , Y\X A , X B \ Sl ) 

- I(Y B \ Sl ;Y B \ Sl \X A , Y A , Y, X B ) - I(Y Sl ;Y Sl \X A , Y A , Y, X B , Y B \ Sl ) 

- I(X Sl ;Y A , Y\X A , X B \ Sl ) - I(X Sl ;Y B \ Sl \Y A , Y, X A , X B \ Sl ) + I(Y Sl ;Y Sl \X A , Y A , Y, X B , Y B \ Sl ) 
=I(X B \ Sl ; Y A , Y\X A ) - I(Y B \ Sl ; Y B \ Sl \Y A , Y, X A , X B \ Sl ) 



-J 



1 {B\S 1 



AJ3\S 

>0. 

Now, we arrive at the same situation as in the original assumption with B replaced by B\Si. Continue 
applying this argument, and we must be able to reach a nonempty C C B, such that J A> c(S) > 0, V<S C C. 



Proof of Lemma 4.4- For any disjoint A and B, 

J(AoB) 
=J{A) + J(B) - J (A U B) 
=I(X A ; Y A c , Y\X M ) - I(Y A] Y A \X N , Y, Y A c) 

+ I(X B ; Y BC , Y\X B c) - I(Y B ; Y B \X M , Y, Y B c) 

- I{X B ] Y( AuB -)c, Y \X( AuB )c) - I(X A ; Y( AvjB y, Y\X A c) 

+ I(Y A ; Y A \X M , Y, Y [AuBr ) + I(Y B ; Y B \X M , Y, Y B c) 
=I{Xa\ Y b \X A c, F(^ uB )c, Y) + I(X B ; X A , Y A \X( AuB )c, Y( AuB )c, Y) + I(Y A ; Y B \X^ : Y, Y( AuB )c) 



=I(Xai Y a ; Y b \X A c, F(^ub) C , Y) + I(X B ; X A , Y A \X t 
=I(X B , Y B ; X A , Y A \X( AuB )c, F(^ uB )c, Y), 

which proves the lemma. 



(AuB) c , Y(AUB} 
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Appendix C 
Proof of Lemma |4~81 

For any disjoint A and B, and any S C A U B, let Si = SA and S 2 = SB. Then, we have 

Raub(S) =I(X, X s ; Y(aub)\s, y\X(AuB)\s) - I(Ys; Ys\X, X AuB , Y^ uB )\s, Y) 

—I(X, Xs 1 ,Xs 2 ; Ya\Si,Yb\s 2 , ^|^\5i,^s\5 2 ) ~ I(Y(Sius 2 y, Y(Sius 2 )\X, X A ,X B , Y A \ Sl ,Y B \ S2 , Y) 
—I(X, X Sl ;Y A \ Sl ,Y B \ S2 , Y\X A \ Sl ,X B \ s , 2 ) + I(Xs 2 ;Y A \ Sl ,Y B \s 2 ,Y\X , X A , X B \ S2 ) 

— [I{Ysi] Ys 1 \X, X A , X B , Y A \ Sl ,Y B \ S2 ,Y) + I(Y S2 ; Y S2 \X, X A , X B , Y A , Y B \ S2 , Y)\ 
=I(X, X Sl ; Y A \ Sl , Y\X A \ Sl ) + /(X, X Sl ; X B \ S2 , Y B \ S2 \X A \ Sl , Y A \ Sl , Y) 

- [I(Y Sl ; Y Sl \X, X A , Y A \ Sl ,Y)-I{X B , Y B \ S2 ; Y Sl \X, X A , Y A \ Sl , Y)} 

+ I(.Xs 2 'i Y A \s 1 ,Y B \ s , 2 ,Y\X, X A , X B \s 2 ) — I(Y S2 ; Y S2 \X, X A , X B , Y A , Y B \s 2 , Y) 
= [I(X, X Sl ;Y A \ Sl ,Y\X A \ Sl ) - I(Y Sl ;Y Sl \X, X A , Y A \ Sl ,Y)] 

+ I{^7 Xsi ; X B \ S2 , Y B \ S2 \X A \ Sl , Y A \ Sl , Y) + I(X B , Y B \ S2 ; Y Sl \X, X A , Y A \ Sl , Y) 

+ H^s 2 ] Y A \ Sl ,Y B \ s , 2 ,Y\X, X A , X B \ S2 ) — I(Y S2 ] Y S2 \X, X A , X B , Y A , Y B \$ 2 , Y) 
=R A (S 1 ) + I(X, X Sl ; X B \ S2 , Y B \ S2 \X A \ Sl , Y A \ Sl , Y) + I(X B , Y B \s 2 ; Y Sl \X, X A , Y A \ Sl , Y) 

+ I(.Xs 2 'i Y A \s 1 ,Y B \s 2 ,Y\X, X A , X B \s 2 ) — I{Ys 2 ] Y S2 \X, X A , X B , Y A , Y B \ S2 , Y). (51) 

When S 2 = B, following (51 ), we have 

Raub(S) =R a (S 1 ) + I(X b ;Y Si \X,X a ,Ya\ Si ,Y) 

+ I(X B ; Y A \ Sl , Y\X, X A ) - I(Y B ; Y B \X, X A , X B , Y A , Y) 
=R A (S 1 ) + I(X B ; Y A , Y\X, X A ) - I(Y B] Y B \X, X A , X B , Y A , Y) 
=R A {S l ) + K AB {B). 

Generally, for any S 2 Q B, continuing pTj ), we have 

Raub(S) >Ra(Si) + H x b, Y B \ S2 ; Y Sl \X, X A , Y A \ Sl , Y) 

+ I(Xs 2 ] Y A \s 1 , Y B \ S2 , Y\X, X A ,X B \s 2 ) — I(Y S , 2 \ Y S2 \X, X A ,X B , Y A , Y B \ S2 ,Y) 
—R A (Si) + I(X B \ S2 , Y B \s 2 ] Y Sl \X, X A , Y A \ Sl , Y) + I{X S2 ; Y Sl \X, X A , X B \ S . 2 , Y B \ S2 , Y A \ Sl , Y) 

+ I(Xs 2 ] Xa\Si, Y B \s 2 , Y\X, X A ,X B \s 2 ) — I(Y S2 ] Y S2 \X, X A ,X B , Y A , Y B \ S2 , Y) 
=R A (Si) + I(X B \s 2 ,Y B \ S2 ; Y Sl \X, X A , Y A \ Sl ,Y) 

+ I(Xs 2 ', Y A , Y B \s 2 , Y\X, X A , X B \s 2 ) — I(Y S2 ] Y S2 \X, X A , X B , Y A , Y B \ S2 ,Y) 
>R A (.Si) + I(X S2 ; Y A , Y B \ S2 , Y\X, X A , X B \ S2 ) - J(Y & ; Y S ,\X, X A , X B , Y A , Y B \ S2 ,Y) 
=R A (S 1 ) + K AVB {S 2 ). 



This completes the proof of Lemma 4.8 
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